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Abstract 

In this paper we are interested in parameters estimation of linear model when number of parameters increases 
with sample size. Without any assumption about moments of the model error, we propose and study the seamless 
I/O quantile estimator. For this estimator we first give the convergence rate. Afterwards, we prove that it cor¬ 
rectly distinguishes between zero and nonzero parameters and that the estimators of the nonzero parameters are 
asymptotically normal. A consistent BIC criterion to select the tuning parameters is given. 
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1 Introduction 


Consider a model where the number of regressors can increase with the sample size n: 

n=X*/3„ + e„ i = l,---,n, (1) 

where (3^ = - ,/3d„) S contains the regression parameters. The design vector Xj, for observation i, is a 

deterministic vector of dimension c?„ x 1. The random variable Si is the model error. Denote by /3° = (Z?®, • • • , /3° ) the 
true value, unknown, of the parameter /3„. In order to automatically select the non-zero components of (therefore 
to select the significant variables), intuitively, the random optimization process would penalize with the ’’norm” Lq (it 
is not a norm) defined by ||/3„||o = This ’’norm” has the disadvantage that it is not continuous in 0, then 

it is computationally infeasible, since all possible models should be considered (all possible combinations of fdj y 0). 
In this paper, we estimate the parameter /3„ of o, penalizing the quantile process with a seamless Lq norm. The 
difficulty in studing of this type of estimation method is that the quantile process is convex in /3„ and the seamless Lq 
penalty is concave. 

In literature on the high-dimension models, it was considered only the case of a quantile process penalized with a 
convex penalty of type Li. Models with the number of variables exceeding the sample size {dn > n) are studied by 
Belloni and Chernozhukov (2011)] , [Fan et al. (2014a)] , [Zheng et al.(2013^ . If < n, references jWu and Liu (200^ , 
Zou and Yuan(2008)| considered variable selection in a quantile model with convex penalties. 

Penalized random process of type: 


Gn(/3„) + Ren(^„), 


( 2 ) 


with the process G„ (/3„) convex in /3„ and the penalty Ren(/3„) nonconvex has been few studied. In [Fan and Peng (2004)] , 
G„(/3) is — loglikelihood and the penalty is nonconvex, with d^/n —>• 00 , as n —>■ 00 . For dn n, [Wang et al. (2014)] 
considered, for the particular case of FjX = x sub-Gaussian, G„(/3„) a loss function and Ren(/3„) nonconvex loss 
penalty. For always dni> n, [Zhang and Zhang (2012)] considered Gn{^n) = (2?^) ^Sr=i(R “ Ren(/3„ 


concave. 


Fan et al. (2014b)] proposed an estimation method based on one-step local linear approximation, when the 


support set for is known. 

To overcome the disadvantage of the discontinuity in 0 of the norm Lq, Dicker et al. (2013)] propose a seamless 
Lq penalty: 
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with A„, 7 „ > 0 two tuning parameters. If 7 „ —>■ 0, the penalty Lg is obtained. Reference Dicker et al. (201^ considers 


G„(/3„) = 'n~^ ~ I with (e^) i.i.d., ]E[ei] = 0, Var{ei) = tr^, suppositions under which the sparsity 

and the asymptotic normality of estimators are proved, if dn/n —>■ 0, for n —>■ oo. If y belongs to the exponential 
family, [Li et al. (2012)| considers Gnif^n) — — loglikelihood/n, with penalty but with a stronger constraint on dn ■ 
d^/n —^ 0 for u —)> oo. 

If the law of the error e is unknown, or if the assumptions on the first two moments of the error are not satisfied, 
then the likelihood, least squares methods with seamless Lq penalty can not be used. This justifies the interest of the 
present paper, where quantile process will be penalized with seamless Lg penalty ©■ 

We give some general notations. Throughout the paper, C denotes a positives generic constant not dependent on 
n which may take different values in different formula or even in different parts of the same formula. All vectors and 
matrices are in bold and all vectors are column. For a vector v, ||v ||2 is the Euclidean norm, v‘ denotes the transposed 
of V. For a matrix M, ||M||2 is the subordinate norm to the vector norm ||.||2, A„iin(M) and A„iax(M) are smallest 
and largest eigenvalues. We use also the notation sgn{.) for the sign function and tr(.) for the trace operator. 

The paper is organized as follows. In Section 2, we introduce and study the convergence rate, oracle properties of 
the seamless Lq quantile estimator. In Section 3 we propose a consistent BIG criterion to select the tuning parameters. 
Finally, in Section 3, we present two lemmas useful to prove the main results. 


2 Seamless Lq quantile estimator 

In this section we propose and study the seamless Lq quantile estimator. For a fixed quantile index r £ (0,1), the 
seamless Lq quantile estimator is the parameter which minimizes the process 


Qn{f3n) — n Pr (Yj '^il^n) + PSELo{l3j) , 


2 n 


with the function pr{.) : R —>■ K+ defined by Pt{u) = u{t — lu<o) and for (3 £ 


At] 




1)- 


Then, the seamless Lq quantile estimator is 

= argmin(5„(/3„). 

/3„GR<i" 


( 4 ) 


Remark 1 We emphasize that the results of fFan et al. (2014hfl , where a concave penalty is considered for quantile 
process, cannot be applied in the present paper, because our penalty cannot written as ||c o /3||i. 

For errors {si) of model o, we consider the following assumption: 

(Al) {e i)i<i<n are i.i.d., with the distribution function F and density function /. The density function / is continu¬ 
ously, strictly positive in a neighborhood of zero and has a bounded first derivative in the neighborhood of 0. The rth 
quantile of Si is zero: r = F{0). 

Let us denote = {dnlri)^l'^. For the deterministic design (Xi)i<i<„ we suppose that: 

(A2) there exist constants 0 < rg < i?o < oo such that rg < Ai„in(?T-~^ — Amax(^^~^ Y(a=i ^ ^o- 

(AS) maxi<i<„ ||Xi||2 = o(a“^). 

On the tuning parameters A„, 7 ^ and on the dimension dn, we suppose: 

(A 4 ) dn is such that dn/n —>■ 0 , as n —>■ 00 . 

(A5) Xn = 0(1), Xny/n/dn -t 00 and 7 „ = 


Assumptions (Al), (A2) are standard for linear model and (A3) is classic for an high-dimensional model. Assump¬ 
tions (A4), (A5) are needed for statistical inference study of /3„ (see e.g. Dicker et al. (201^ , [Lee et al. (2014)] ). 


For /3„ £ let be the difference between two quantile processes: 

n 

Gn{(3n) = - X‘/3J - Pr{e,)]. (5) 

2=1 
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Following theorem states that the estimators /3„ has a convergence rate of order 
classic convergence rate vT^I'^ of quantile estimator for a finite-dimensional model (see 


If dn is bounded, we find the 
Knight (1998)1 ). 


Theorem 1 Under assumptions (A1)-(A5), we have: ||/3„ — /3°||2 = Op(a„). 

Proof. In order to prove the theorem, we show that for all e S (0,1), there exists a constant large enough > 0, such 
that we have, for any n large enough. 


inf Q„(/3° -t Sa„u) > Qn(/3°) 

|u||2 = l 


> 1 - e. 


( 6 ) 


Fix e G (0,1) and consider some u = (ui, • • • ,Ud„) G with ||u ||2 = 1. Let be some constant c > 0. Consider 

dn 


Qnifdn + COnU) - (5n(/3°) = —G„(ca„u) -f '^[pSELoiPj + CUnUj) - PSELoWj)]- 

i=i 


(7) 


For the penalty, we have the following inequality: 

d-n 

y^}PSELo{P° + CUnUj) - PSELo[P°)] > ^ [PSELoiP^j + CUnUj) - PSELoiP'j )], 


t=l 




where J(u) = {Z G {I,-- - ,dn}; PSELoiPi + ca„u/) — psELoiPi) < 0}. Because c, u are fixed and q;„ —>■ 0 , then by 
Lemma [U for all j G J(u), and for large enough n, there exists Cj > 0 such that 

Pselo{I 3° + canUj)-psELo{P°) = ^ [ff(/3° + ca„u,) - g(/3°)] = 


Thus, by assumptions (A4) and (A5), we have: 


X! [PSELo[li'j + cUnUj) -psELoiP^j)] > -^ | =- 0 (A„a„ 7 „d„) =-0(a„a^/^) =-o(a^). (8) 


We now study the expectation of G„(cq!„u): 


log 2 




canX'u 


^0<ei<t 


dt] =J2 


/*CanX^U 


.E;[G„(ca„u)] = ^]E[pr{ei - ca„X‘u) - Pr{ei)\ = ^E 
On the other hand, by (Al), for u —>■ 0, we have /”[F(t) — F{0)]dt = + o(u^). Using (A3), we have: 

[F{t) - F{0)]dt = ^cal- ^(X*u)2 + o{al -^ u‘(X,X*)u). 


[F{t) - F{Q)]dt. 


-| 71 /•CCHtiX'u 


Then 


-iE;[G„(ca„u)] = l„t(^x,X‘)u(l + o(l)). 

n 2 n 


(9) 


Consider now the random variables Vi = {1 — r)lIe^>o — Ri = Prisi — cq:„X‘u) — Prisi) — cQ:„I?iX*u and the 

random vector W„ = coinDiyi\. Thus, the process G„ can be written: 

n 

G„(ca„u) = £:[G„(ca„u)] -|- W„u + ^ [l?* - E[Ri]]. 

But, since, by (Al), the errors (e^) are independent, using also \Ri\ < |canX^u|l|£.|<|ca^x*u |5 we have: 

n n n n 

E[Y}R^ - E[R,]f = ^ E[R, - E[Ri\Y < ^ |ca„X‘u|2jE;[l|,^|<|,„^x*u|]- 
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Taking into account assumptions (Al) and (A3), we have -2i'[l|£i|<|ca„x*u|] = -^dcariX-ul) — F(—|ca„X-u|) = 
Ca„|X‘u| < Canmaxi^i^ra ||Xi ||2 = o(l), with C > 0. Then, using assumption (A2), we obtain: 

n n 

lE[Y^[R,-E[Ri]]]^ = o{aWY,^^^l^)=o{dn)- ( 10 ) 

Consider now the random variable C/„ = — E[Ri]]. Taking into account (|T(I)) . we have E[U!^] = o(l). 

Since E[Un] = 0, by Bienayme-Tchebychev inequality, we have C/„ 0. Thus X]r=i [d^i ~ = op{dl/'^). 

Returning to G„, we have, taking into account (jH]): 

Gnicanu) = E[Gn{canu)] + W„u + op{dl/‘^) 

or again 





Z^dn 


,L 

H-^X,X*)u- 

i—\ 




2=1 



(1 + op(l)) + op((iy^). 


( 11 ) 


Since n X)r=i converges in distribution to a centered normal distribution, by assumptions (A4) and (A2), 

for a large enough constant B, we have that the first term of the right side that will dominate in (IHJ. Then, 

1 1 " 

—G„(Ra„u) = fiO)B^alu \-^ X,X‘)u(l + op(l)). (12) 

’ 2=1 

Thus, for n and B large enough, we have (2n)“^G„(i?a„u) > 0. On the other hand, by relations © and ([S]), 
Q„(/3° + BanU.) - Qnifdn) > (2n)“iG„(Ra„u) - o(a^). Taking also into account relation (TT^ and assumption (A2), 
we obtain ®. ■ 


Let us consider the parameter set, with the constant _B > 0 of relation ®: 

V„„(/3°) = {/3„eR'''‘; ||/3„-/3°|| 

According to Theorem [H the seamless Lq quantile estimators belong to Vq^(/ 3°), with a probability converging to 1. 
For the index set A, with A C {1, • • • , (i„}, we will denote by |^| its cardinal. Throughout the paper, we denote by /3_^ 
the sub-vector of /3„ containing the corresponding components of A. Similarly for X^^^. Consider also the following 
index set: 


A° = {JG {!,■■■ ,dny, / 3 °^ 0 }. 


(13) 


The following theorem gives the oracle properties for the estimators /3„ = (/3i, • • • ,f3dn): defined by ®. Note that, 
with respect to the paper of Dicker et al. (2013)| , for showing the normality of the nonzero estimators, the condition 
< M is not needed, for some 6 > 0 and M < oo. 


Theorem 2 Under assumptions (A1)-(A5), we have 
(i) lim F[{j G {I,-- - ,d„}; Pj ^0} = ^°] = 1. 

n—¥oo 

(a) For any vector u of dimension such that ||u ||2 = 1, if we denote S_ 4 o = n~^ X)r=i then 

v^(u‘S-J)-iu)-i/"u*(3_^o -/33 io) 

Proof, (i) If we denote by the complementary set of A° in {I,-- - ,d„}, we will prove that for any f3^ = 
(3_/^oc) G Va„{fdn) such that ||/3^o — /3^o||2 = Op(q:„) and any constant G > 0, we have 

Q„((/3^o,0))= niin Q„((/3^o,/3^oc)). (14) 

||/3^oc||<Ca„ 


4 








Consider the following parameter set Wn = {/3„ G Va„(/3°); ||/3^oc||2 > 0}. We show that P[,3„ G hVn] —>■ 0, as n —>■ oo. 
Let /3„ = (/3^o,/3_4 oc) G Wn and an another parameter /3„ = (/3^o,/3_4oc) G Va„(/3°), such that /3^o = /3^o and 
= 0. Define 


a. 


.(/3„,/3J = g„(/3J-Q„(/3J = — ^[p.(r.-X‘/3„)-p.(r.-X*/3„)] + ^ pseloWj). (15) 

i=l 


Concerning the penalty of relation (TTSl) . as in the proof of Lemma A.2 of Dicker et al. (2013)| , relation (A.7), we have 
that there exists C > 0 such that 

E PSELoWl) > log + 1)11/3. - Mp 




On the other hand, by assumption (A3), we have that there exists Ci > 0 such that liminf„_>oo (log {C/{C + 7nOn) + 
l)) > Cl > 0. Then, for n large enough, there exists C > 0 such that 


E 


PSELoiPj) 


>CA„. 


ll/^n /^nlls 

Let be the identity that follows from Knight (1998)| , for any a;,j/ G M, 
Pr{x -y) - Pr{x) = ?/(lx<0 - x) + [ “ "^x<o)dt. 

Jo 

Using this relation for the first sum of (IT5|) . we obtain: 


(16) 


2n 


^ [pr{Y, - Xl(3J - pr{Y, - X*/3J] = —(/3„ - /3J‘ 


2=1 


2=1 


-X*y3„<0 


- r 




rX*(/3„-/3„) 


JO 


i=l 




(17) 


For Tin we have, by assumption (A3) and since the density / is bounded in a neighborhood of 0: 

]E[Tin] = (/3„ - KY— Y, X4C(X‘(3„ - /3° )) - F{0)] = (/3„ - 3„)‘^ ( E - 3n)/(0)(l + 0(1)). 

2=1 2=1 

Then |®[ri„]| < ||^„ - 3„l|2||(2n)-iEr=iXgX‘|y|/3° - 3j|2/(0)(l + o(l)). Since the matrix n"! ^”=1 XgX‘ is 
Hermitian, we have that X]r=i XiX^H^ = Aniax(?T-~^ X]r=i XiX() < Rq. Hence, by (A2), we have |lC[Ti„]| < 
ll/3n - '^nhWfJn “ 3nll2^o/(0). Therefore 1 E;[Ti„] = 0(||/3„ - - ^nlb) = OQI^n - ^nM)- By calculations 

analogous to ]E[Tin], using independence of e^, we have that ]E[Ti^] = Cn“^||/3„ — /3„||^ —>■ 0, for n —>• oo. Since 
Var\Tin] < lE\Ti^, using BienaymCTchebychev inequality, we obtain 


ri„ = c||/3„-^j|2(i + op(i)). 


(18) 


Study now T 2 „ of (fT71) . which can be written as: T 2 „ = n ^ <t-x*(/ 3“-3 ' ~ ^ 

Then, taking into account that /3„ G Va„(/3°), together with assumptions (Al), (A3), we have 


0 l""ei<t-X*(/3“-/3„) ^Ei<-X*(/30 


®KJ= 1 E 


^X*(/3„-/3„) 


rX*(/3„-/3„) 


[b/(X‘(^„-/3°))+o(t)]df. 


[F(t-X‘(^°-^J)-F(-X‘(/3°-/3„))]d< = - ^ 

By Theorem [U together with assumptions (Al), (A3), we have that f(X.l{f3„ — /3°)) is bounded by a constant 
C G (0, oo). Thus, as for Ti„, using assumption (A2) and the fact that llXilH — tr(n“^ X^X*) —>• 0, 


5 












we have |®[T 2 „]| < Cn'i l|X,|l|||/3„ - W\Pn - ^Ih + o{n-^ ^”=1 X‘(/3„ - /3J) = C||/3„ - (3Jl We show 
similarly that _25[T2„] = — /3„|P- Then, by Bienayme-Tchebychev inequality, we have: 

T2„ = C||/3„-3J|^(1 + op(1)). (19) 

Hence, by relations dm), dill), we obtain 
Tm + T2„ = C||/3„ - ;3J|2(1 + op(l)). 

Thus, taking into account this last relation together with relations dni), dun, dni): and since /3„ G >V„, we have: 
£>„(/3„,3„)||/3n - > C'll/3n “ '^nh + 5A„.^Since ||/3„ - 3nll = 0(an) and A„/q:„ -)> oo by (A5), we have that 

there exists C+ > 0 such that I?n(/3„,/3„)||/9„ — ^n\\ 2 ^ > C'+An > 0. But for /3°, taking into account the definition 
of and that of £)„(/3°,3n) = Qn(/3°) - we have that i:)„(/3°,3„) = C'll/S” - 3„lli(l + op(l)). Then, by 

(A3), we have IP[/3„ £ W„] —t 0 and relation (fTTl) follows. 


(a) Taking into account the estimator convergence rate obtained by Theorem [T] and claim (i)^ the estimator f3„ 
can be written + q;„( 5, with, 8 = ((5i, • • • , <5d„) G K'^", 8j^oc = 0 and ||<5^o||2 < ClAl®!. Consider then 


Q„(/3° + anS) - QM) = ^ + c^nS)) - pM)\ + V, 


( 20 ) 


with V = EjeAO PSELoWj + CfnSj) — PsELoiP^)- Let US first study V. For any j G Al°, by Lemma[Tl we have that 
there exists a constant Cj such that 

PSELoiPj + anSj) - PSELoiPj) = + OinSj) - g(^°)] = (|/3° + | “ |^°|) Cj, 

with \Cj\ < 00 , for any j G Al°. Since a„ —>■ 0, |/3°| > C > 0, Vj G and 6 j bounded, we have that for n large enough, 
the parameters (3^ + anSj and P'3 have the same sign. Then 


V = C^7„a„ ^ (±5,) = CA„a„7„|Al°|. 

For the first term of the right-hand side of (1201) we have: 


1 ” (T' ” 1 ” panels 

^ “ a„Xjd) - PriSi)] = -^ E ^ E / 


2n 


[flei<t ~ ll£i<o]'^^ = Ji + J2- 


i—1 i—1 

Since lE\Ji\ = 0, using independence of (sj), assumption (A4) and the Cauchy-Schwarz inequality, we get that 


2 2 r! 

VaT{J,) < ]E[Jl] = |^r(l - r) ^(X*^)^ < |^r(l - r) ^ ||X*^o||2||<5^o |^ 0 | < ^2 ^ ^ o_ 

i—1 i—1 


For J 2 we have: 




n ranXS 


n 

(t/(0) + o{f)) dt = -fi0)als \-^ X.X‘)5(1 + 0(1)). 


( 21 ) 


( 22 ) 


Usingassumption(A2), wehavethat/(0)a2||^||2.;^^.^(j^ < f{0)al\\8\\l-X^^^{n 

Taking into account the fact that = ||^^o ||2 < ClAl^l, we have 1B[J2] = C'/(0)a^|Al°|. We prove similarly 

Var{J 2 ) = 0(n“^Q:^|Al°|). We compare q;^|AI°| with A„q;„ 7 ti|AI°| obtained by (1^ for the penalty, ~ 

By (A5), jn = O (^^5 thus ^ Then ^—>• 00 , as n — 00 . Thus, minimizing (1^ amounts to 

minimizing Ji + J 2 , with respect to and. Using (I22L we obtain: 


^ n n ^ 

^ + a„S)) - pr{e^)] = ^ E XUo<5^o[ll,,<o - r] + -f{Q)al8\ol3A<^8j,o{l + op(1)). (23) 


6 








The minimizer of (1231) is: 


1 1 ^ 

— ~~ ^ ^ ^ (ffei<0 ”''”))■ (24) 

For studying (IMl) . let us consider the following independent random variable sequence Wi = (/(0))“^u*S^^Xi^(l£.<o— 
r), with u a vector of dimension |^°| and such that ||u ||2 = 1. We have that ]E[Wi] = 0 and Var{Wi) = 

nr(l — r)(/(0))“^u‘S)^Ju. Then, by CLT for independent random variable sequences {Wi), we have 

v^/(0) -qv(0,l). (25) 

^r(i-r)(u*s;;;u)™ 

Claim (ii) results taking into account the fact that / 3 _ 4 o — /3% = an^^o and relations (l24l) . (|25|) . ■ 

Remark 2 The cardinal of the set may depend on n and converge to oo as n —> oo. 


3 Tuning parameter selection 


In this section we propose a criterion of type BIC to select the tuning parameters A and 7 . This criterion will also 
estimate the set AP, defined by (fT51) . We start with introducing some notations. 

• An a some index set C {1, • • • ,dn}, which does not depend on tuning parameters. 

• (A, 7 ) G (0, 00 )^ some tuning parameters, which does not depend on n. 

• (3^^ (A, 7 ) the seamless Lq quantile estimator of / 3 _ 4 ^ obtained on some index set An C {1, • • • , dn} and with A, 7 as 

tuning parameters. We denote its components by 7)1 for j G An- 

• /3(A„, 7 n) the seamless Lq quantile estimator of /3 obtained on the index set {1, • • • ,dn}, with (A„, 7 „) as tuning pa¬ 
rameters. Then ,3(A„, 7 „) = /3„, with /3„ obtained by (0]). We denote its components by /3j{Xn,jn), for j G {1, • • • , d„}. 

• ■^3(A„,7„) = {j ^ {f j ' ■ ■ ! C?n} j Pj (^nj 7ra) 7^ 0}- 

• (A„, 7 „) is a tuning parameter sequence such that: lim„^oo ^ ^ = Al°] = 1. 


In order to define the BIC criterion, let us consider {Sn)n^i, a sequence of real numbers, defined as: 
• if dn /logn = o(l), we consider S'„ = 1 for any n G N; 


if dn/logn ^ o(l), we consider {Sn) a sequence converging to oo such that 


fog^i aO 


Sn log n 




0 . 


In order to select An, A et 7 , we consider the following BIC criterion: 

BIC(An; (A, 7 )) = log (- E (A, 7 ))) + (A, 7)11 


0 , 


(26) 


with ||/3^„(A,7 )||o = Ejh For the tuning parameters A„, 7 „ and the estimator (3{\n, 7 n), let us consider 

the value of the BIC criterion corresponding to (|26)) : 


R/C(A„,7n)=log(-Vp.(y2-X‘3(A„,7n))) + i^^„||3(A„,7n)llo. 

n ^^ n 

If the conditions of Theorem!^ are satisfied, then / 3 (A„, 7 „) satisfies the sparsity property, 
lim P[{j G {!,• • • ,dn}-, Pj{Xn,^n) 7 ^ 0} = Al°] = I. 

n—^oo 

In order to prove, by the following theorem, that the BIC criterion selects correctly, with a probability converging 
to 1 , the tuning parameters A and 7 , we will consider the index sets An such that \An\ < s„, with the assumption 
Sn = 0{n°‘), 0 < a < 1/2. Consider also two index sets Ain et ^ 42 ^: 

^Irt = {An', A^ C An, -PP 7 ^ An, \An\ ^ S„}, 712^ = {An', A^ 2 P^n, |Aln| ^ Sn}- 
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Theorem 3 We suppose that 0 < lE[prie)] < oo. Then, if instead of assumption (A4) we take q as 

n —>■ 00, under (A1)-(A3), (A5), we have: 


lim P[ min (A, 7)) = B/C(A„,7„)] = 1 . 

n->oo ,d„},(A,7)e(0.oo)2 


Proof. The theorem is proved if the following two statements are shown: 
lim P[ min (A, 7)) > ^/(^(An,7„)] = 1 , 

n->oo AnGAin 

lim P[ min BIC{A„; {X,j)) > BIC{Xn,jn)] = 1 - 

n—)-00 AnGA2n 

Proof of relation Ii27\ ). Since An S Ain, then \An\ > |-4°|. Let us consider the difference 
BIC{An-,{X,-l)) - BIC{Xn,ln) 


( 27 ) 

( 28 ) 


= log (1 + 


^Pr{Y,-KAAA^\T))-n-^T.tlPr{Y^-XAmXn,ln)), , log 


«-'Er=iPrm-X‘^(An,7n)) 


)+^5„[|X|-|^°|]. 


In addition of index set An S Ain, let us consider the following sets: Ai = {j; fdnj ^ 0} and A2 = {j; /3.A„ j(A, 7 ) ^ 0}. 
Recall that /3„ is /3(A„, 7 „). Since A° C An, by Theorem [5Ki), we have that, lim„^oo F Ai = A2 = A^ = 1 . Without 

loss of generality, we suppose that AP Ai A2, the other cases are similar. Using the elementary inequality 
\pr(u — v) — Pr{u)\ < |f|, for all u,v gW, we have, with probability one, 

n n 

^■'1 E [PriY^ - Ka 3 aS\i)) - Pr{Y, - X)3(A„,7n))] | < ^ |x^ (3^„(A,7) -3(A„,7n)) I 

i=l i—\ ^ 

- (3^„(A,7) -3(An,7n))_^^ II 2 

which is, by assumption (A3) and Theorem [1] o(a„)Op(a„) = op(l). For the second inequality , the estimators 
/3.A„(A,7) were completed by with zeros for obtaining a vector of dimension dn- Then 

n n 

n-i^p.(r.-X*^^J^JA,7))-n-i^p.(y.-X‘3(A„,7n)) ^ 0. (29) 

n Yl—^OO 

2=1 2=1 

In the same way, we have: n~^ {pTpYi — X)/3(A„, 7 „)) — Pri^i)) —^ 0. On the other hand, be the LLN, 

' ' n —>00 

1 ^ 

-^Prisi) ^ IB[p^(e)] e (0,oo). (30) 

77 , ‘ ^ n—>oo 

2=1 

Taking into account (l2^ and (l30l) . we can apply the inequality log(l + x) > —2\x\ for all \x\ < 1/2, 


log (I 


> -2 


n-V.(y, - X^„^^„(A,7)) - n-1 ELi Pr{Y^ - X*^(A„,7n)) 
n-'Er=iPr(P.-X‘3(A„,7n)) 

In-V.(r, - X‘ 3.4„ (A, 7)) - n-^ Er=i Pr{Y. “ X‘3(An, 7n)) 


) 


T-l 


S(L,p.(r,-X‘/ 3 (A„, 7 „)) 


( 31 ) 


But, by the proof of Theorem [I] relation dTlTl . we have, with probability tending to 1: n-Vr(P.-X‘,^„/3^„(A,7))- 
n~^ X)r=i PriYi — X-,3(A„,7„)) = Ca^, with C > 0 , for n large enough. Using (lOTll . we have: 

min {BIC{An-, (A, 7)) - BIC{Xn,ln)) > min ( - C^|A„| + (|A„| - |A°|)) > C > 0, 










with C > 0. So, relation (HZl) is proved. 


Proof relation JM) . Let be the index sets An S A 2 n and An = An U A°. 

Let /3^„(A,7) be the estimator of dimension \An\ built on the variables Let also (3^^^ equal to /3_4^(A,7) on An 

and completed with 0 to obtain a vector of dimension \An\- Then, denoting 6° = minjg _40 |/3°| > 0, we have 



Then, since Pr(-) is convex, we have that there exists /3_; 
XU„3^„(A,7)) > Eti Pr(Y. - Ka^aJ- Thus 


I, with ||/3j - 13j II 2 = 6°, such that Pr{Yi - 




2=1 


A-P )An^Bs{An) i/3-l3°)j^^eBs{An) 

-HGuAr, ((/3 - /3“)^J] I - (3^„ - /3TJ] , 

^hh (3_4„ - (3jJ = Efci [Pr{Y^ - X^^3^„) - Pr{£z)] and {{(3 - (3°)xJ defined similarly. 

As for the calculation of relation (IT9)) . we have, with probability converging to 1: 

~ HGn,AMf^-f^°)Aj]>Gn{b°r, 

iP-P°)^^€Bs{A„) 


with C > 0 and 

Gn,AAh^-f^lJ=GAdn). (32) 

By Lemma [ 2 ] we have: 

K,a. ((/3 - ^°)aJ - dE[G^,An ((^ - /3°)aJ] I = Op(4/^u(i+“)/2). 

(P-P°)An^Bs{A„) 

Then, with probability converging to 1, as n —>■ 00 , we have TZ > G{lPY ~ _ dnn~^. Taking into 

account the assumption nG~^'>A(jG‘^ —^ 0, we have that, for n large enough, with probability converging to 1, 

7^ > C{h°f > Cl > 0. (33) 


Hence 


min [BIG{An\{\l)) - BIC{An-,{\l))\ = (IXI - |A„|) 

An&A^n ^ ^ ' n 

Yr,=, Pr{Y. - K^IaJaSKi)) - Etl Pr{Y. - 


+ min log (1 + 

An&A2r. ^ 


T:=,PriY.-Xl^JjJ 


which is, with probability converging to 1, using 


> min min (log2, 

An&Ar ' 


Cl 


n-^EtiPAY.-^lj^Aj 


)-K 


0 logn 


Sn > 0. 


The last inequality (> 0) results from (15^ together with ]E[pn{£)] G (0,oo). 
As for relation H27I) . we can prove, with probability tending to 1, for n —>■ 00 : 


(34) 


BIG{An;iX,-f)) > min BIGiA'n, > BIG{Xn,'yn), 

A'^ 

A^ ^A\ri {2Sn') 


(35) 
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with Ain(2sn) = {An',A^ C An,A° An, \An\ < 2s„}, s„ = 0{n°‘), a G (0,1/2). Then, with probability tending to 1, 
using (1551) and (IMl) . we have 

min BIC{An;iX,j)) - BIC{Xn,jn) 

= min [BIC{An; (A, 7 )) - BIC{An] (A, 7 )) + BIC{An\ (A, 7 )) - BIC{A°-, (A„, 7 „))] 

> min [BIC{An;iX,'l)) - BIC{An;iX,'l))] > 0 

AneA2„ 

and relation (PSI) is proved. ■ 


Theorem [3] implies that we can choose as tuning parameters (A„, 7 „) thereby: 

(Ai,A„, 7 „)= argmin (A, 7 )), 

,!i„},(A,7)e(0,oo)^ 

|^Ti|<Sn 

choosing some s„, such that = 0{n°'), a G ( 0 , 1 / 2 ). Obviously An = AP with a probability tending to 1 . Then, in 
applications, we must first fix „ 4 „,A ,7 and calculate: 

^ 1 J'- \ _ I a, I 

/ 3 ^„(A, 7 ) = argminQ„(/3) = argmin(—-X/_^ /3) + -—- ^ log ( , + l)). 

/3GRi^ni /seRi^ni ^ \og2 +1 


Afterwards, we vary An, A, 7 on grid and take as A„, 7 ^ and An = A^^ - ^: 


{An,Xn,%)= argmin 

.4„C{1,... .d„},(A,7)e(0,oo) 

|^n|<Sn 


(\0g{-j2Pr{Y^-X^U3A^^l))) + i^^„||3^„(A,7)||o). 

\Tl. ft J 

^ 1—1 ' 


Then, we estimate simultaneously the best tuning parameters A„ and 7 „ and the parameters f3 that have compo¬ 
nents different of 0 , such that the corresponding index set An is equal to A^, with probability tending to 1 . 


Remark 3 Theorem\^is the equivalent of Theorem 2 of ^Li et al. (2012)1 , where the seamless Lq penalized likelihood 
approach is considered, or of Theorem 2 of ^Dicker et al. (2013)1 , for seamless Lq penalized LS approach. 

In \Lee et al. (201^ , a BIC criterion is proposed to select the significant predictor variables ofiX. in an high-dimensional 
quantile model. 

Remark 4 Algorithm and numerical part are a very difficult task, since G„(,3„), defined by 1^, is convex in and 
the penalty Ven{f3^) = J^jZiPSELoiPj) is concave in both being continuous, but not differentiable in The 
same type of problem as ours, but with the process Gnifin) likelihood (then differentiable in 13.^), was analyzed 
by f Dicker et al. (2013)1 . They propose the coordinate descent algorithm to solve the optimization problem. For the 
method proposed in present paper, another work should be conducted on numerical method in order to find the seamless 
Lq quantile estimator and the tuning parameters using the criterion given by Theorem O 


4 Lemmas 


Lemma 1 Let he the function 5 : M —>■ M defined by g{x) = log(/i(a:) -I- 1), with the function h : R —>■ 

Ixl 

h(x) = - —;-. Then, \/xi,X 2 ,C G R such that |xi|,|a: 2 | > G > 0 and Ixil — \x 2 \ = o(l) we have that there ex- 

\x\ -f 7 „ 

ists G > 0 such that: 5 ( 2 : 2 ) — 5 ( 2 : 1 ) = G 7 „(| 2 : 2 | — \xi\){—iy^A^iX 2 ) _ 

Proof. By elementary calculus we have 


h(x2) - h[xi) = 7 „ 


( 12 : 2 ! - \xi\){-lYan(xyX 2 ) 

(|a;iK7n)(|2:2| +7n) 


10 














Then, taking into account the fact that for |a;| ~ 0 we have log(a: + 1) ~ x, the lemma follows. ■ 

Let us consider the following notations: 

e = (/3-/3°u„, 

gA^{£i,0) = Pr{e^)-IE[pr{£^-'K.\^^^e)-pr{£i)\, Vz = 1, • • • , n, 

Bs{A^) = ||0||2<<5}, V,5 > 0. 

Lemma 2 Under assumptions (A2), (AS), if Sn = 0(n°'), with a € (0,1/2), then, for any S > 0, we have 

n 

sup sup I Vg^„(ei,0)| = 

A. eeSs(A„) 

Proof. The proof is similar to that of Lemma A. 3 of Lee et al. (2014)| . We consider for fc > 1, 0n(2~^S, An) a grid 
of points in Bs(A„) such that for any 9 G Bs{An) there exists G Qn{2~^6,An) such that ||0 — < (5/2^. If 

we denote AI = maxi<i<d„ ||Xi|| 2 , then, for a given constant Ci > 0, let we consider the natural number: 

Kn = min (fc > 1; 

Using the fact that for any rt, u G K: |pr(w — v) — Pt{u)\ < 1'^!, then, we have with probability 1: 

Cl 


IX! [9An{£i,d) - gAr,i£i,0^^"'^)] \ < -;^nC2|yl„|i/2g;i/2_ 
eeBsiA„) ^ 


(36) 


Denote 


Pi=P[ sup sup |^ 5 ^„(£i,e)| > Cn(i+“)/24C]. 


A^ eeBsiAn) 

l^n |<Sn 


Inequality (1551) implies 


Pi<P[ sup sup |X5 .a„(£.,0^'^"^)| > 


A„ e&BiiAn.) 

On the other hand, for the cardinality Nk{An) of 0„(2“^5,^n) , we have: Nk{An) < (1 + 4 • 2^)l-^"l. Then 
P, < y: n-[ sup f:i9ul,fe,(»‘‘»)-9.4.fe,«<*-‘>)l>/(n<'+“'''Xdl 

^„,|.4„|<s„ SeBsiA^) 

K„ 


(37) 


^ E E Nk{An)Nk-i{An)max*f’[\^[\gA„{£i,0^‘"'^) - gAr^iei^O^'" ^^)]| > —, 

, I-Ati I <-571 1 i—1 

yK-n 


with pk > 0, Y(jk=i'9k < 1- The max* is calculated over all 0*-^^ G 0n{2~^5,An) and G 0„(2“''+^i5,.A„), with 

|| 0 (^) _ ^(^-1) II 2 < 3 • 2~^8. Moreover, by assumption (A2) we have: n~^ 1^! An. ~ < 18Ao2“^^5^. 

We take pk = max (2“^fc^/^/8, log^^^(l + 4 • 2^)). By the Hoeffding inequality, we ob¬ 

tain: 


Kn 


Vi < E E exp (2s„log(l -I- 4 • 2^=) - 

.A„;|^„|<s„ fc=l 


C‘^pln°‘d„ 


Kn 


482 • i?o • 2-2fc,52 


) < 2 ^ E (- 

fc=i 


C‘^kn°‘d„ 


) 


2 • 82 • 482 • i?o • <52 
(38) 

the last relation following from the fact that s„ = 0{n°‘). Lemma follows from relations (|37l) and (l38l) . ■ 


, / , , C‘^kn°‘dn \ 
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